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Abstract 

For human beings, the processing of text streams of unknown size leads generally 
to problems because e.g. noise must be selected out, information be tested for its 
relevance or redundancy, and linguistic phenomenon like ambiguity or the resolution 
of pronouns be advanced. Putting this into simulation by using an artificial mind-map 
is a challenge, which offers the gate for a wide field of applications like automatic text 
summarization or punctual retrieval. In this work we present a framework that is a 
first step towards an automatic intellect. It aims at assembling a mind-map based on 
incoming text streams and on a subject- verb-object strategy, having the verb as an 
interconnection between the adjacent nouns. The mind-map's performance is enriched 
by a pronoun resolution engine that bases on the work of [2] . 

1 Introduction 

A text stream is a data flow that is lost once it is read. Such a stream occurs very often 
in practice, for example while reading a text or listening to a story, probably told by 
someone else. In both cases, human beings store the major incidents rather associative. 
First, they remove noise and then extract information out of it, which can either be 
relevant or redundant /obvious. Then, relevant information is connected very adaptively, 
meaning that if the same information is read or listened again, the association between 
co-occurred words increases (or decreases, in case it is not). With such a performance, 
inconsiderable information gets lost whereas important facts can be kept. This is quite 
important, because a constructive processing - like the generation of a summarisation of 
the text and a retrieve of contents - becomes manageable. 

Incremental-adaptive mind-maps serve in a similar way as they simulate such a human per- 
formance: through their associative, incremental, and adaptive architecture they process 
incoming data streams, adapt internal structures depending on the given input, strengthen 
or weaken internal connections, and send longer-established connections to a simulating 



short- and/or long-term memory. In this respect, we base on a work given by [TT that 
argues for a real-time approach for finding associative relationships between categorical 
entities from transactional data streams. Technically, these categorical entities are rep- 
resented as connectionist cells while associations are represented by links between them. 
These links may become stronger over time or degrade, according to whether the associa- 
tion re-occurs after a while or not is observed for a while. The work suggests a three-layer 
architecture: in the first layer, the short-term memory treats the incoming signals and con- 
structs the associations. The second layer, which is called the long-term memory, stores 
associations that have a strong connection and that may be useful for a further analysis. 
The last layer, the action layer serves as a communication interface with the user over 
which he can consult the actual state of the system and interact with it. 
The generation of such a mind-map becomes complicated by the fact that the incoming text 
can be corrupt or even ambiguous. For example, pronouns produce an ambiguity between 
existing/referenced persons in the text: having The President of United States has said 
that . . . and a succeeding Furthermore, he has mentioned that . . . leads undoubtedly 
to the same person but the recognition of such relationships is not natural. If we keep 
such relationships unsolved, the mind-map can become ineffective or even wrong. In this 
respect, a meaningful part of the intended mind-map described in this work concerns with 
the resolution of pronouns. For this, we are inspired by some earlier work, notably a 
syntax based approach [11 . All possible candidates for a pronoun are evaluated on a set 
of salience factors, as for example recency or subject emphasis. The candidate with the 
highest salience weight will be chosen as antecedent. [12] presents a similar approach where 
the candidates are evaluated on indicators, but no syntactic or semantic information on the 
sentence are needed. Furthermore, the mind-map concerns with a temporal management 
of text streams to construct an actor-based structure. 

2 Architecture of the Mind-map 

The motivation of pronoun resolution for the semantic network learning is to find the 
correct antecedent for each pronoun. This is important to construct complete mind-maps 
for each actor in a text. For this, the text stream is treated by a sliding window, which 
first buffers and processes a certain number of sentences with the consequence that the 
information - once it is read - gets lost. For each sentence that is in the sliding window, 
a pre-defined subject-verb-object structure is instantiated and arranged in a semantic net- 
work structure, having concepts and connections between them. The connections become 
stronger or weaker according to the underlying text stream, i.e., the occurrence of the 
subject-verb-object instantiation. 

Figure [T] shows the general architecture of the mind-map. First, the complete text, i.e., 
each sentence, is preprocessed, which is done in order to get syntactic and semantic infor- 
mation out of the text to further treat the input. In fact, pronouns are used as substitutes 
for nouns in a text. As an example, the pronoun he refers back to Harry in a sentence like 
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Figure 1: Architecture 

Harry goes to the zoo where he looks at the beautiful animals. Then, a predefined structure 
of subject(s), verb(s) and object(s) is extracted from each sentence as well as the adjacent 
adjective(s) of both subjects and objects. All these extracted elements are in fact the 
essence of the sentence. Finally, the co-reference resolution focuses on merging concepts 
that relate to the same content. As an example, the concepts President Washington and 
George Washington relate both to the same person. However, the co-reference resolution 
is limited to the actors of the text. 

3 Accuracy - Pronoun Resolution 

Following our experiences and looking back at the most important concepts for each cat- 
egory of the text - where most important refers to those that have the most outgoing 
edges - we have observed that these concepts are generally the actors of the stories (this 
is in respect to stories) whereas for biographies and news articles, the most important 
concept is the person the biography or news text is about. In scientific texts, the actors 
are often not the most occurring actors. In respect to the structures that occur multiple 
times inside a text stream, one can observe that most of all subject-verb structures reoccur 
more often than subject- verb -object structures. Those subject-verb structures that occur 
multiple times mostly contain a verb of cognition or communication as for instance: say, 
think or explain. 

In concern of the accuracy of the pronoun resolution - that is how many pronouns are 
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correctly or wrongly resolved and even remain unresolved (see Table [T]) - we have observed 
that the resolution results applied to pronouns given in third person singular are rather 
successful. For this, we have used texts from different domains, i.e., fairy tales, news 
articles, biographies and scientific articles. Only the resolution of it and they lead to an 
insufficient accuracy, which demand for an alternative method. 
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Table 1: Resolving the pronouns: correct, wrong, and unresolved. 



4 Implementation 

In concern of the implementation, we use a graphical user interface, on which the user can 
operate, for example to fix the window size, to fix the actors in the text, and to look at 
the different outputs of the program - as for example the different sub-mind-map related 
to each actor, diverse actor statistics. For the preprocessing of the text streams, we still 
need 

• the tagged text, which permits to filter out all the nouns, proper nouns and pronouns. 

• the parse tree, which gives more information about the constituents of each sentence, 
as for example the clauses. 
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• the grammatical relations between the single words of a sentence, relating for exam- 
ple a subject noun with its corresponding verb. 




■sees ■ 
lion 



Figure 2: Mind-map for John sees the yellow lion. 
With this, the pronoun resolution works as illustrated in the selective examples: 

• he/she: we take the last male/female noun or name occurring before the pronoun 
that acts as a subject in the sentence. If there is none, we take the last male/female 
noun or name before the pronoun. 

• they: we look back at the last two sentences and take the last plural before the 
pronoun. Plurals remain either plural nouns (e.g. the women, the children, the cars) 
or noun phrases containing nouns connected by and or , (e.g. John and Paul). 

• it: we detect if it is pleonastic or not. If it is pleonastic, it has no antecedent as for 
example in the phrase: It can be seen that ...). This is done with the help of a set 
of some fixed sentence structure patterns (taken from [3]). If it is not pleonastic, 
we take the last non-living object occurring before the pronoun which is part of a 
non- prepositional phrase. 

To extract the structure of subject- verb-object from each sentence, the grammatical rela- 
tions described in [9] are used: 

John sees the yellow lion 

with 

nsubj (sees-2, John-1) 
det(lion-5, the-3) 
amod(lion-5, yellow-4) 
dobj (sees-2, lion-5) 

The relation nsubj (nominal subject) relates the noun John with the corresponding verb 
sees, whereas the relation dobj (direct object) relates this verb with the object lion. In this 
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Figure 3: The workbench. 



way, the sentence elements are extracted and the sentence structure can be translated into 
the mind-map. All subjects and objects take over the roles of the concepts, whereas the 
verbs serve as connections between the concepts. The adjectives represent sub-concepts 
of both subjects and objects. From a graphical point of view, actors are represented 
as double-circles, while concepts that represent no actors are drawn as boxes. The sub- 
concepts (adjective) are drawn as diamonds. Concepts are linked by a directed arrow, 
labeled with the verb that relates the subject with the object. An example can be seen in 
Figure [2] representing the sentence John sees the yellow lion. 

In order to merge concepts - that refer to the same actors - we use an incremental actor- 
based thesaurus. Sine the user can enter different information about the actor - for example 
the first name, the last name, nicknames, etc. in advance - we use this external information 
to establish the thesaurus. Following the spirit of [H], the concepts are then matched. 
Figure [3] presents the implemented user-interface consisting of different components, for 
example the technical (left) part (including processing information, graph options, and 
actor statistics), the monitoring part (below, including the last parsed sentences and in- 
formation about each node), and the notes part (to do and save own comments). The 
workbench is enriched by help buttons. 

5 An Example 

The following text is taken from an extract of the children story Malcolm the Scotty Dog. In 
this example, the focus is on an actor called Malcolm. The text is processed sentence- wide. 
With that, we start with 

Malcolm picked the bone up and ran over 
to the other side of the garden. 

The mind- map for the actor Malcolm after this sentence can be seen in Figure |4} The actor 
Malcolm is centralized pointing to the concepts bone and side of garden. The last concept 
is characterised by a sub-concept called other. After the next sentence, the mind-map of 
Malcolm has evolved in the way as represented in Figure [5j 

He set the bone down and looked around. 

We observe that he has been resolved to Malcolm. An empty concept is added since looked 
around does not imply an object. The concept bone is stimulated again by set down (new 
concept) and connected to it. With 

He picked it up and could not wait to taste it. 

both occurrences of it have been replaced by bone (Figure |6]). The negative verb could not 
wait is specially marked in the mind-map by an inhibitating arrow. The phrase he picked 
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up the bone has re-occurred in the text stream. To mark this re-occurrence in the mind- 
map, the structure Malcolm - picked up - bone has been enforced (by means of a straight 
Hne). Here, it is possible for the reader to display the mind-maps in certain depths. By 
selecting a depth of 1, only the concepts directly related to the actor will be represented, 
while for a depth of 2, all the concepts at a distance of two nodes will be displayed. This 
can be illustrated by processing the following sentence. 

The bone was big and it tasted delicious. 

By displaying a depth of 1 , the mind-map of Malcolm will be as in Figure [6j But when 
displaying a depth of 2, the mind-map will look as in Figure [7j Here, we notice that 
the concept bone is explained in a more detailed way. And in fact, the user decides how 
detailed the mind-map should be. Figure [8] shows the mind-map after the processing of a 
larger amount of sentences. 
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Figure 4: Mind-map of Malcolm after first sentence 




Figure 5: Mind-map of Malcolm after two sentences 



6 Conclusions 

The mind-map is a knowledge structure that continuously actualises itself as long as text 
is read. The representation of the mind-map as a semantic network structure permits to 
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Figure 7: Mind-map of Malcolm after four sentences, with a depth of 2 



gather all the actions, thoughts and states of being of one actor in a graphical represen- 
tation. Through the temporal consolidation, changes over time can easily be captured in 
the mind-map. Currently, we work on two mind-map extensions that concern with an 
improved interaction. First, and since a main application is the support of a textual sum- 
marization of read text streams, we currently build an automatic text-based summariser. 
The first (prototypical) version simply outputs the concepts related to an actor, including 
the sub-concepts and the connections. As the connections are syntactically unchanged, it 
is easy to generate sentences out of it. Secondly, a selective information retrieval engine 
is currently done through the extension of the user/mind-map communication through a 
SQL-like interface. With that, we aim at queries like the following: 

select sub-concepts, concepts from mind-map 
with depth=l 

where concept = "Malcolm" 

This leads to a result set where all concepts, sub-concepts, and associations are retrieved. 
The operation depth says that only the neighbor elements are considered. In case that 
depth is set to > 2, all components of the over-next level arc retrieved. A second retrieval 
then results in a set where only all sub-concepts of Harry are retrieved. 

select sub-concepts, name from mind-map 
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Figure 8: A Mind-map after having read 20 sentences. 
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where name = "Harry" 

To be more precise, the following commands are currently under implementation: 

• select : the projection that gives the concepts, sub-concepts, and associations to 
other concepts. 

• from : the selection to a mind-map; alternatively, several mind-maps can be ad- 
dressed. 

• with depth : the depth around a concept. 

• where : the where clause allows a diverse condition setting. 

However, a disadvantage of the mind-map is currently that it grows fast and becomes very 
large. With this implementation, texts with > 500 sentences are still an overkill. In this 
respect, the optimization of the existing solution is a future concern as well. Furthermore, 
sentences can be composed of not only one single clause, but of several clauses. These 
clauses are either independent or dependent clauses. Independent clauses can stand as a 
simple sentence and express a complete thought. Dependent clauses on the other hand 
can not express a complete thought by standing alone as a sentence. They simply make no 
sense when standing alone. This is why dependent clauses are connected to an independent 
clause. This connection is lost in the mind-maps so that some branches going off one actor 
do not make a lot of sense. Also, the application depends highly on the accuracy of the 
parser used during the preprocessing. If the parser can not identify the subject(s), verb(s) 
and object(s) of the sentence, errors or gaps will occur in the mind-maps. Also, as the 
resolution of some pronouns depends on the correct processing of the sentences, some 
pronouns may be wrongly resolved due to mistakes of the parser. 
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